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Abstract. Sampling a large network with a given probability distribution has 
been identified as a useful operation. In this paper we propose a distributed algo- 
rithm for sampling networks, so that nodes are selected at a special node, called 
the source, with a given probability distribution. This algorithm is based on a 

I I new class of random walks, that we call Drifting Random Walks (DRW). A DRW 

r) starts at the source and always moves away from it. 

^*^ We propose a DRW algorithm for connected networks that selects a node with 

any desired probability distribution. A drawback of this algorithm is that it needs 
preprocessing. When the probability distribution is distance based (i.e., the prob- 
ability of selecting a node is a function of its distance to the source), variants of 
the DRW algorithm without preprocessing are proposed. 

CN A DRW algorithm has the novel key features that (1) it always finishes in a num- 

ber of hops bounded by the network diameter, and (2) selects a node with the exact 
probability distribution. Furthermore, unlike previous Markovian (e.g., classical 
random walks and epidemic) approaches, DRW does not need to stabilize and 
can efficiently be used as a service to obtain multiple independent samples. 



1 Introduction 



_ ^ Sampling a large network with a given distribution has been identified as a useful op- 

k> eration. For instance, sampling nodes with uniform probability is the building block of 

5^ epidemic information spreading fl0"91. Similarly, sampling with a probability that de- 

^ pends on the distance to a given node |4 14 1 is useful to construct small world network 

topologies 01116131 . Other applications that can benefit from distance-based node sam- 
pling are landmark-less network positioning systems like NetICE9 fT3l, which does 
sampling of nodes with special properties to assign synthetic coordinates to nodes. Cur- 
rently, there is an increasing interest in obtaining a representative (unbiased) sample 
from the users of online social networks fl). In this paper we propose a distributed 
algorithm for s ampling networks with a desired probability distribution. 
* This research was supported in part by the Comunidad de Madrid grant S2009TIC-1692, 
Spanish MICINN grant TEC2011-29688-C02-01, and National Natural Science Foundation 
of China grant 61020106002. 



Related Work One technique to implement distributed sampling is to use gossiping 
between the network nodes. Jelasity et al. ||9l present a general framework to implement 
a uniform sampling service using gossip-based epidemic algorithms. Bertier et al. |3] 
implement uniform sampling and DHT services using gossiping. As a side result, they 
sample nodes with a distribution that is close to Kleinberg's harmoiric distribution (one 
instance of a distance-dependent distribution). Another gossip-based sampling service 
that gets close to Kleinberg's harmonic distribution has been proposed by Bonnet et 
al. im. However, when using gossip-based distributed sampling as a service, it is shown 
in |l5l that only partial independence between samples can be guaranteed without re- 
executing the gossip algorithm. 

Another popular distributed techirique to sample a network is the use of random 
walks ifTSl . Most random-walk based sampling algorithms do uniform sampling I1I7L 
usually having to deal with the irregularities of the network. Sampling with arbitrary 
probability distributions can be achieved with random walks by weighting the hop prob- 
abilities, for instance using Metropolis-Hastings random walks [12 8 1. 

In 1141 . it was shown how sampling with an arbitrary probability distribution can be 
done without communication if a uniform sampling service is available. In that work, 
like in all the previous approaches, the desired probability distribution is reached when 
the stationary distribution of a Markov process is reached. The number of iterations (or 
hops of a random walk) required to reach this situation (the warm-up time) depends on 
the parameters of the network and the desired distribution, but it is not negligible. For 
instance, in [151 it is found by simulation that, to achieve no more than 1% error, in a 
torus of 4096 nodes at least 200 hops of a random walk are required for the uniform 
distribution, and 500 hops are required for a distribution proportional to the inverse of 
the distance. In the light of these results, Markovian approaches seem to be inefficient 
to implement a sampling service, specially if multiple samples are desired. 

Contributions In this paper we present an efficient distributed algorithm to implement 
a sampling service. The basic technique used for sampling is a new class of random 
walks that we call Drifting Random Walks (DRW). A DRW starts at a special node, 
called the source, and always moves away from it. The sampling process in the DRW 
algorithm works essentially as follows. A DRW always starts at the source node. When 
the DRW reaches a node x, the DRW stops at that node with a stay probability. If the 
DRW stops at node x, then x is the node selected by the sampling. If the DRW does not 
stop at X, it jumps to a neighbor of x. To do so, the DRW chooses only among neighbors 
that are at a larger distance from the source than x . (The probability of jumping to each 
of these neighbors is not necessarily the same.) 

We propose a DRW algorithm that samples any connected network with any prob- 
ability distribution (given as weights assigned to the nodes). The drawback of this ap- 
proach is that, before starting the sampling, some preprocessing is required. This pre- 
processing involves building a spanning tree in the network, and performing a flooding 
and a convergecast over the tree. Additionally, each node has to maintain state data to be 
used by the DRW. However, this is compensated by the facts that, once the preprocess- 
ing is completed, multiple independent samplings with the exact desired distribution 
can be efficiently performed, each taking at most D hops (where D is the diameter of 
the spanning tree). 



When the probabiUty distribution is distance-based and nodes are at integral dis- 
tances (measured in hops) from the source, variants of the DRW algorithm without pre- 
processing nor state data are proposed. In a distance-based probability distribution all 
the nodes at the same distance of the source node are selected with the same probabil- 
ity. (Observe that the uniform and Kleinberg's harmonic distributions are special cases 
of distance-based probability distributions.) In these networks, each node at distance 
fc > from the source has neighbors (at least) at distance fc — 1. We can picture nodes 
at distance k from the source as positioned on a ring at distance k from the source. The 
center of all the rings is the source, and the radius of each ring is one unit larger than 
the previous one. Using this graphical image, we refer the networks of this family as 
concentric rings networks. 

Observe that every connected network can be seen as a concentric rings network. 
For instance, by finding the breadth-first search (BFS) tree rooted at the source, and 
using the number of hops in this tree to the source as distance. This topology can also 
be imposed in real networks. For instance, consider a radio network in which each node 
has a fixed position assigned (say, with a GPS). Then, fixing a source node, the nodes in 
the kih concentric rings can be the nodes whose (Euclidean) distance to the source is in 
the interval (fc — 1, fc]. If the communication radius is reasonably large, the requirements 
of the concentric rings topology model will be satisfied. 

The first variant of DRW algorithm we propose samples with a distance-based dis- 
tribution in a network with grid topology. In this network, the source node is at position 
(0, 0) and lattice (Manhattan) distance is used. This grid contains all the nodes that are 
at a distance no more than the radius R from the source (the grid has hence a diamond 
shape). The algorithm we derive assigns a stay probability to each node, that only de- 
pends on its distance from the source. However, the hop probabilities depend on the 
position {i, j) of the node and the position of the neighbors to which the DRW can 
jump to. We formally prove that the desired distance-based sampling probability distri- 
bution is achieved. Moreover, since every hop of the DRW in the grid moves one unit 
of distance away from the source, the sampling is completed after at most R hops. 

We have proposed a second variant of the DRW algorithm that samples with distance- 
based distributions in concentric rings networks with uniform connectivity. These are 
networks in which all the nodes in each ring fc have the same number of neighbors in 
ring fc — 1 and the same number in ring fc + 1. Like the grid variant, this variant is also 
proved to finish with the desired distribution in at most R hops, where R is the number 
of rings. 

Unfortunately, in general, concentric rings networks have no uniform connectivity. 
This case is faced by creating, on top of the concentric rings network, an overlay net- 
work that has uniform connectivity. In the resulting network, the above DRW variant 
can be used. We propose a distributed algorithm that, if it completes successfully, builds 
the desired overlay network. We have found via simulations that this algorithm succeeds 
in building the overlay network in a large number of cases. 

In summary, DRW can be used to implement an efficient sampling service because, 
unlike previous Markovian (e.g., classical random walks and epidemic) approaches, 
(1) it always finishes in a number of hops bounded by the network diameter, (2) selects 
a node with the exact probability distribution, and (3) does not need warm-up (stabiliza- 



tion) to converge to the desired distribution. In the case that preprocessing is needed, 
this only has to be executed once, independently on the number of samples taken. 

The rest of the paper is structured as follows. In Section l2] we introduce concepts 
and notation that will be used in the rest of the paper. In Sectionl3]we present the DRW 
algorithm for a connected network. In Sections|4]and|5]we describe the DRW algorithm 
on two concentric rings networks: grids and topologies with uniform connectivity. Fi- 
nally, in Sectionl6]we present the simulation based study of the algorithm for concentric 
rings topologies without uniform connectivity. 



2 Definitions and Model 

Connected Networks In this paper we only consider connected networks. This family 
includes most of the potentially interesting networks we can find. In every network, 
we use N to denote the set of nodes and we assume that there is a special node in the 
network, called the source and denoted by s. We assume that each node x G N has 
an associated weight w{x) > 0. Furthermore, each node knows its own weight. The 
weights are used to obtain the desired probability distribution p, so that the probability 
of selecting a node x is proportional to w{x). Let us denote rj = J^jeN ''^(j)- Then, 
the probability of selecting x ^ N is p(x) = w{x)/rj. (In the simplest case, ■w{x) = 
p{x)yx and 77 — 1.) 

DRW in Connected Networks As mentioned, in order to use DRW to sample connected 
networks, some preprocessing is done. This involves constructing a spanning tree in the 
network and performing a weight aggregation process. After the preprocessing, DRW 
are used for sampling. A DRW starts from the source, jumping to one of its neighbors in 
the tree. When the DRW reaches a node x G N,it selects x as the sampled vertex with 
probability q{x), which we call the stay probability. If x is not selected, a neighbor 
y of X in the tree is chosen, using for that a collection of hop probabilities h{x,y). 
The values of q{x) and h{x, y) are computed in the preprocessing and stored at x. The 
probability of reaching a node x ^ N is called the visit probability, denoted v{x). 

Concentric Rings Networks We also consider a subfamily of the connected networks, 
which we call concentric rings networks. These are networks in which the nodes of 
N are at integral distances from s. In these networks, no node is at a distance from s 
larger than a radius R. For each k G [1, i?], we use M^ 7^ to denote the set of nodes 
at distance k from s, and n^ — |Mfc|. These networks can be seen as a collection of 
concentric rings at distances 1 to i? from the source, which is the common center of 
all rings. For that reason, we call the set Rk the ring at distance k. For each x E Mfc, 
7fe (a;) > is the number of neighbors of node x at distance fc — 1 from s (which is only 
s itself if A; — 1), and Sk{x) is the number of neighbors of node x at distance fc + 1 from 
s (which is if fc = R). 

The concentric rings networks considered must satisfy the additional property that 
the probability distribution is distance based. This means that every node 2: e M^ has 
the same probability pk to be selected, for all fc e [1, i?]. This property allows in the 
subfamilies defined below to avoid the preprocessing required for connected networks. 



Grids A first subfamily of concentric rings networks considered is the grid with lattice 
distances. In this network, the source is at position (0, 0) of the grid, and it contains all 
the nodes (i, j) so that i, j E [—R, R] and \i\ + \j\ < R. For each k £ [1, R], the set of 
nodes in ring fc is M^ = {{i, j) : |i| + |j| ~ k}. The neighbors of a node {i, j) are the 
nodes {i — 1, j), {i + 1, j), {i,j ~ 1), and {i,j + 1) (that belong to the grid). 

Uniform Connectivity The second subfamily considered is formed by the concentric 
rings networks with uniform connectivity. These networks satisfy that 

\fk,\fx,ye Rk, 6kix) = 4(y) A -/kix) = ik{v)- (1) 

In other words, all nodes of ring k have the same number of neighbors 6k in ring k + 1 
and the same number of neighbors 7^ in ring k — 1. 

DRW in Concentric Rings Networks The behavior of a DRW was already described. 
In the algorithm that we will present in this paper for concentric rings networks we 
guarantee that, for each k, all the nodes in R^ have the same visit probability Vk and the 
same stay probability qt . A DRW starts from the source, jumping to one of its neighbors 
(in the first ring). When it reaches a node x £ Mfe, it selects x as the sampled vertex 
with stay probability qk- If x is not selected, a neighbor y G R^+i of x is chosen. 

The desired distance-based probability distribution is given by the values p^, k G 
[1, i?], where it must hold that X]fe=i "fe ^ Pfc — 1- The problem to be solved is to define 
the stay and hop probabilities so that the probability of a node x S Kfc is p^. 

Observation 1 If for all k £ [^, R] the visit Vk and stay qk probabilities are the same 
for all the nodes in Mfe, the DRW samples with the desired probability iffpk = ^fe ■ Qk- 



3 Sampling in a Connected Network 

In this section, we present a DRW algorithm that can be used to sample any connected 
network. As mentioned, in addition to connectivity, it is required that each node knows 
its own weight. A node will be selected with probability proportional to its weight. 

DRW Algorithm The DRW algorithm for these networks works as follows. 
Building a spanning tree The algorithm for first builds a spanning tree of the network. 
A feature of the algorithm is that, if several nodes want to act as sources for DRW, they 
can all share the same spanning tree. Hence only one tree for the whole network has to 
be built. The algorithm used for the tree construction is not important for the correctness 
of the DRW algorithm. There are several well known algorithms JZj that can be used to 
build the spanning tree. 

Weight aggregation Once the spanning tree is in place, a source that wants to use DRW 
for sampling has to trigger a process in which nodes compute and store aggregated 
weights. This is a preliminary process that, like the construction of the tree, has to be 
executed only once. It involves flooding the whole tree and collecting data back up the 
source. 

Figure [T] describes the behavior of a source node (left side), and of the rest of the 
tree nodes (right side), respectively, in the weight aggregation process. Before starting 



a DRW, a source node has to make sure that each node obtains the accumulated weights 
of its subtrees. To achieve that, the source node floods the tree sending a REQUEST 
message (Line l3]l to its children. The children of a source node are its neighbors in 
the tree. The rest of the nodes, when they receive a request message, consider as their 
parent, with respect to the source s, the sender of the message, and consequently their 
children are the rest of their neighbors. When a copy of the REQUEST message reaches 
a leaf node -a node without children (Line[T6|, the node returns its weight to its parent 



in a RESPONSE message (Line 18 1. Otherwise, when the reached node is not a leaf, 
the REQUEST message is forwarded to its children (Line [T9] l. When a node receives 
a RESPONSE message from one of its children containing the accumulated weight of 
the child branch (Line [20]i, it stores this value (LinepTj). When this node has received 
the RESPONSE messages from all of its children (Line [22]), it adds its own weight and 



the accumulated weights of its children (Line[23]l, and it sends a RESPONSE message 



containing this value to its parent (Line 24 1. At the end, this process stops when the 



source s receives all the RESPONSE messages of its children (Lines 6]|9 1. This process 



is executed only once. After that, many DRW can start from the source (Lines T0p4 1. 



DRW sampling The spanning tree and the precomputed aggregated weights are used 
by the DRW to perform the samplings (as many as needed). This process is detailed in 
Figure[T]for the source node (Lines [T0][T4]i, and for the rest of the nodes (Lines [25pT| ). 
The length of the DRW is bounded by the diameter D of the tree. 

Analysis We show now that the algorithm proposed performs sampUng with the desired 
probability distribution. 

Theorem 1. Each node x £ N is visited by the DRW with probability v{x) — — ^ 



and selected by the DRW algorithm with probability p{x) 



w{x) 



V 



Proof. We prove the claim by induction on the number of hops from the source s to 
node X in the spanning tree. The base case is when the node x is at 1 hop of s (i.e., it is 
a child of s). Then, x is visited with probability — ^^^, since x is chosen by s with this 



probability in the first hop of the DRW (Line 13 i. If x is visited, then it is selected with 



probability q{x) = w{x)/T{x) (Line 26 1. Then, the probability of being selected is 

T{x) w{x) w{x) 



Pr [select x] 



1] T{x) Tj 



The induction hypothesis assumes the claim true for a node x at distance i from s. 
We consider a child y of x, which is at distance i + 1 from s. 

Pr[visit,]=.(x)(l~,(.)) ^J[^^^^^ , 

where 1 — q{x) is the probability of not staying at node x, and ^^ ._^. ^ is the proba- 
bility to choose the child node y in the next hop of the DRW. Then, 



,(,).?Mfl_-(-)^ nv) 



T{x) J T{x) — w{x) 
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task Tree_Source{s) 


15 


2 


DRW_enabled ^ false 


16 


3 


send to children REQUEST{s) 


17 


4 


when RESPONSE(s, chid, sum) received 


18 


5 


T(chld) <— sum 


19 


6 


if received RESPONSE from all children 


20 


7 


then 


21 


8 


V ^ E,6cMW(.) n^ 


22 


9 


DRW_enabled ^ true 


23 


10 


when DRW_START received 


24 


11 


wait until DRW_enabled 


25 


12 


choose a node x £ child{s) 


26 


13 


with probability T{x)/ri 


27 


14 


send DRW_MSG (s) to x 


28 
29 
30 
31 



task Tree_Node{x, parent) 
when REQUEST{s) received 

if a; is a leaf then 
send RESPONSE{s, w{x)) to parent 

else send REQUEST(s) to children 
when RESPONSE(s, chid, sum) received 

T(cMd) <— sum 

if received RESPONSE from all children then 

send to parent RESPONSE{s, x, T{x)) 
when DRW_MSG{s) received 

with probability q(x) — w{x)/T(x) do 
select node x and report to source s 

otherwise 
choose a node y G child(x) 

with probability y, ,_ ^ 
send DRW_MSGis) to y 



u(x) 



Fig. 1. DRW algorithm for connected networks (left: code for source s; right: code for node x). 



v{y) 



T{x) fT(x) — w{x) 



■q 



and 



T{x) 
Pr[select y] = v{y)q{y) = 



T{y) ^T{y)^ 
T{x) — w{x) rj 

T{y) w{y) ^ w{y) 
V T{y) 77 



4 Sampling in a Grid 

If the algorithm for connected networks is applied to a grid, given its regular struc- 
ture, the construction of the spanning tree could be done without any communication 
among nodes, but the weight aggregation process has to be done as before. However, 
we show in this section that all preprocessing and the state data stored in each node 
can be avoided if the probability distribution is based on the distance. DRW sampling 
process was described in Section[2] and we only redefine stay and hop probabilities. 

From Observation [T] the key for correctness is to assign stay and hop probabilities 
that guarantee visit and stay probabilities that are homogenous for all the nodes at the 
same distance from the source. 



Stay probability For k e [l,i?], the stay probability of every node (i,j) G 

defined as 

nk -Pk rik- Pk 

Qk 



Ik IS 



j=k ^j ■ Pj 

As required by Observation [T] all nodes in . 
bility (7fl = 1, as one may expect. 



1 V— vfe — 1 

1 - L,j=i n, ■ Pj 
Ik have the same qk. Note that the proba- 



Hop probability In the grid, the hops of a DRW increase the distance from the source 
by one unit. We want to guarantee that the visiting probability is the same for each node 
at the same distance, to use ObservationfTI To do so, we need to observe that nodes {i,j) 
over the axes (i.e., with i = Ooi j = 0) have to be treated as a special case, because they 
can only be reached via a single path, while the others nodes can be reached via several 
paths. To simplify the presentation, and since the grid is symmetric, we give the hop 
probabilities for one quadrant only (the one in which nodes have both coordinates non- 
negative). The hop probabilities in the other three quadrants are similar. The first hop of 
each DRW chooses one of the four links of the source node with the same probability 
1/4. We have three cases when calculating the hop probabilities from a node (i, j) at 
distance fc, < fc < _R, to node («', j')- 

- Case A: The edge from (i,j) to {i',j') is in one axis. The hop probability of this 

]ir±issettohki{t,j),it',j')) = T^= kTi- 

- Case B: The edge from {i,j) to {i' ,j') is not in the axes, i' — i + 1, and / — j. 
The hop probability of this link is set to hk{{i, j), {i + l,j)) = 2ii+j+i) "= 2{k+i) ' 

- Case C: The edge from {i,j) to {i' ,j') is not in the axes, i' = i, and j' = j + 1. 
The hop probability of this link is set to hk{{i, j), {i,j + 1)) = 2(i+^+i) ^ 2(iXi) - 

It is easy to check that the hop probabilities of a node add up to one. 

Analysis In the following we prove that the DRW that uses the above stay and hop 
probabilities selects nodes with the desired sample probabiUty. 

Lemma 1. All nodes at the same distance k > to the source have the same visit 
probability Vk- 

Proof. The proof uses induction. The base case is k = 1, which is trivial since the 
probability of visiting each of the four nodes at distance 1 from the source sisvi = 1/4. 
Assuming that all nodes at distance fc > have the same visit probability Vk, we prove 
the case of distance fc + 1. Recall that the stay probability is the same q^ for all nodes 
at distance fc. 

The probability to visit a node x — {i' ,j') at distance fc + 1 depends on whether x 
in on an axis or not. If it is in one axis it can only be reached from its only neighbor 
(i, j) at distance fc. This happens with probability (case A), 

i -\- j ^ 

Pr[visit x] = Ufc(l - qt) . , . , _, = Vk{^ - Qk)- 



i+j+l 'fc+1 

If X is not in an axis, it can be reached from two nodes, (i' — l,j') and {i',j' — 1), at 
distance fc (Cases B and C). Hence, the probability of reaching x is then 

Hence, in both cases the visit probability of a node x at distance fc + 1 is Vk+i = 
Vk{l — qk)-k+i- This proves the induction and the claim. 



Theorem 2. Every node at distance kfrom the source is selected with probability pk- 

Proof. If a node is visited at distance k, it is because no node was selected at dis- 
tance less than k, since a DRW always moves away from the source. Hence, Pr[3a; G 
Kfc visited] = 1 — X) 7=1 ^jPj- Since all the Uk nodes in M.^ have the same probability 
to be visited (from the previous lemma) and the stay probability is qk — ^r^^'' — , the 

probability of selecting a particular node x at distance k from the source is 

Pr [select x] = | 1 - JI "jPj —^^TT^ = P^- 

Where it has been used that ^ _;^ rijPj = 1 and that (1 — J2i=i ''^jPj) — J2i=k ^^jPj- 



5 Sampling in a Concentric Rings Network with Uniform 
Connectivity 

In this section we derive a variant of DRW algorithm to sample a concentric rings 
network with uniform connectivity, where all preprocessing is avoided, and only a small 
(and constant) amount of data is stored in each node. Recall that uniform connectivity 
means that all nodes of ring k have the same number of neighbors Sk in ring fc + 1 and 
the same number of neighbors jk in ring k — 1. 

Distributed algorithm The general behavior of the DRW algorithm for these networks 
was described in Sectionl2] In order to guarantee that the algorithm is fully distributed, 
and to reduce the amount of data a node must know a priori, a node at distance k that 
sends the DRW to a node in ring fc + 1 piggybacks some information. More in detail, 
when a node in ring fc receives the DRW from a node of ring fc — 1, it also receives 
the probability Vk-i of the previous step, and the values Pk-i, nk-i, and Sk-i- Then, 
it calculates the values of Uk, Vk, and qk- After that, the DRW algorithm uses the stay 
probability qk to decide whether to select the node or not. If it decides not to select it, it 
chooses a neighbor in ring fc + 1 with uniform probability. Then, it sends to this node 
the probability Vk and the values Pk,nk, and 6k, piggybacked in the DRW. 

Figure [2] shows the code of the DRW algorithm. The source s sends the DRW with 
values vq — 1, uq ^ 1, po ~ 0, and do- Each node in ring fc must only know initially 
the values Sk, 7fc and pk- Observe that Uk (number of nodes in ring fc) can be locally 
calculated as Uk — nk-i5k-i/lk- The correctness of this computation follows from 
the uniform connectivity assumption (Eq.[T]i. 

Analysis The uniform connectivity property can be used to prove by induction that all 
nodes in the same ring fc have the same probability Vk to be reached. The stay probabil- 
ity qk is defined as qk — Pk/vk- Then, from Observation [T] the probability of selecting 
a node x of ring fc is pk — Vkqk- What is left to prove is that the value Vk computed in 
Figurel2]is in fact the visit probability of a node in ring fc. 

Lemma 2. The values Vk computed in Figure^are the correct visit probabilities. 



1 task DRW{x,k,Sk,'yk,Pk) 

2 when {vk-i,Pk-i,nk-i,Sk-i) received 

3 n-fc <- nfe_i-^; Vk 
4 
5 
6 
7 



'^fc-l ^7 ' Ik 



with probability q^ do select node x and report to s 
Otherwise 

choose a neighbor y in ring fc + 1 with uniform probability 

send {vk,Pk,nk,5k) toy 
Fig. 2. Drifting Random Walk algoritiim for node x in ring fc. 
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Fig. 3. UNI and PID scenarios without uniform connectivity. Without using the AAP algorithm 
(left side) and using it (right side). 



Proof. Let us use induction. For fc = 1 the visit probability of a node in ring 1 is 
1/ni, while the value computed by the algorithm is vi = no{vo — po)/ni = l/rii. 
For a general fc, assume the value Vk-i is the correct visit probability of ring fc — 1. 
The visit probability of a node in ring fc is Vk-ink-i{l — qk-i)/nk, which replacing 
Qk-i = Pk-i/vk-1 yields the expression used in Figurel2]to compute Ufc. 

The above lemma, together with the previous reasoning, proves the following. 

Theorem 3. Every node at distance k of the source is selected with probability pk- 



6 Concentric Rings Networks without Uniform Connectivity 

Finally, we are interested in evaluating, by means of simulations, the performance of 
the DRW algorithm variant for concentric rings with uniform connectivity when it is 
used on a more realist topology: a concentric rings network without uniform connec- 
tivity. The experiment has been done in a concentric rings topology of 100 rings with 
100 nodes per ring, and it places the nodes of each ring uniformly at random on each 
ring. This deployment does not guarantee uniform connectivity. In order to establish the 
connectivity of nodes, we do a geometric deployment. A node x in ring k is assigned 
a position in the ring. This position can be given by an angle a. Then, each network 
studied will have associated a connectivity angle /3, the same for all nodes. This means 
that X will be connected to all the nodes in rings fc — 1 and fc + 1 whose position (an- 
gle) is in the interval [a — /3/2, a + /3/2]. We compare the relative error of the DRW 



1 function AssignAttachmentPotnts{x , k) 

9 LCM(nfc,rafc.|.i) 

3 C -f- Nfc+i (a;) /* neighbors ofx in nn^ A: + 1 */ 

4 A:[ <— /* ^i: /i- a multiset */ 

5 loop 

6 choose cfrom C 

7 senrf ATTACH_MSG to c 

8 recei?;e RESPONSE_MSGfrom c 

9 if RESPONSE_MSG = OK then 

10 ap •«— ap — 1 

1 1 flrfd c to Ax I* c can he in A^ several times */ 

12 else C ^ C \ {c} 

13 until {ap = 0) V (C = 0) 

14 if [ap — 0) then return A^: 

15 else return FAILURE 

Fig. 4. Assignment Attachment Points (AAP) Function (left side). Success rate of the AAP algo- 
rithm as a function of the connectivity angle (right side). 
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algorithm when sampling with two distributions: the uniform distribution (UNI) and 
a distribution proportional to the inverse of the distance (PID). We define the relative 
error e^ for a node a; in a collection C of s samples as e^ ~ l/ji'mx-Zxl ^ where fsira^ is 
the number of instances of x in collection C obtained by the simulator, and f^ = p^ ■ s 
is the expected number of instances of x with the ideal probability distribution (UNI 
or PID). We compare the eiTor of the DRW algorithm with the error of a generator of 
pseudorandom numbers. For each configuration, a collection of 10^ samples has been 
done. 

Figure [3] (left side) presents the results obtained in the UNI and PID scenarios. In 
both cases, we can see that the DRW algorithm performs much worse than the UNI 
and PID simulators. The simulation results show a biased behavior of DRW algorithm 
because the condition of Eq. [Tjis not fulfilled in this experiment (i.e. a node has no 
neighbors, or there are two nodes in a ring k that have different number of neighbors in 
rings fc — 1 or fc + 1). 

AAP Algorithm To eliminate the errors observed when there is no uniform connectiv- 
ity, we propose a simple algorithm to transform the concentric rings network without 
uniform connectivity into an overlay network with uniform connectivity. 

To preserve the property that the visit probability is the same for all the nodes in a 
ring, nodes will use different probabilities for different neighbors. Instead of explicitly 
computing the probability for each neighbor, we will use the following process. Con- 
sider rings k and fc + 1. Let r = LCM(n/j,rij.+i), where LCM is the least common 
multiple function. We assign -^ attachment points to each node in ring k, and -^^^- 

f^k ^fc + 1 

attachment points to each node in ring fc + 1. Now, the problem is to connect each at- 
tachment point in ring fc to a different attachment point in ring fc + 1 (not necessarily 
in different nodes). If this can be done, we can use the algorithm of Figure l2] but when 
a DRW is sent to the next ring, an attachment point (instead of a neighbor) is chosen 
uniformly. Since the number of attachments points is the same in all nodes of ring fc 



and in all nodes of ring fc + 1, the impact in the visit probability is that it is again the 
same for all nodes of a ring. 

The connection between attachment points can be done with the simple algorithm 
presented in Figure HI in which a node x in ring k contacts its neighbors to request 
available attachment points. If a neighbor that is contacted has some free attachment 
point, it replies with a response message RESPONSE _MSG with value OK, accepting 
the connection. Otherwise it replies to x notifying that all its attachment points have 
been connected. The node x continues trying until its ^^ attachment points have been 
connected or none of its neighbors has available attachment points. If this latter situation 
arises, then the process failed. Combining these results with the analysis of Section [5] 
we can conclude with the following theorem. 

Theorem 4. Using attachment points instead of links and the distributed DRW-based 
algorithm of Figure^ it is possible to sample a concentric rings network without uni- 
form connectivity with any desired distance-based probability distribution pk, provided 
that the algorithm of Figure^completes (is successful) in all the nodes. 

Figure [3] (right side) shows the results when using the AAP algorithm. As we can 
see, the differences have disappeared. The conclusion is that, when nodes are placed 
uniformly at random and AAP is used to attach neighbors to each node, DRW performs 
as good as perfect UNI or PID simulators. 

In general, the algorithm of Figure HI may not complete. It is shown in the table of 
Figure HI (right side) the success rate of the algorithm for different connectivity angles. 
It can be observed that the success rate is large as long as the connectivity angles are 
not very small (at least 60°). (For an angle of 60° the expected number of neighbors in 
the next ring for each node is less than 17.) For small angles, like 15° and 30°, the AAP 
algorithm is never successful. For these cases, the algorithm for connected network 
presented in SectionlSlcan be used. 
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